Search CORE

81 research outputs found

P2P XQuery and the StreetTiVo application

Author: Boncz Peter A.
Zhang Yi
Publication venue: Dagstuhl Seminar Proceedings. 06431 - Scalable Data Management in Evolving Networks
Publication date: 01/01/2007
Field of study

MonetDB/XQuery* is a fully functional publicly available XML DBMS that has been extended with distributed and P2P data management functionality. Our (minimal) XQuery language extension XRPC adds the concept of RPC to XQuery, and we outlined our approach to include the services offered by diverse P2P network structures (such as DHTs), in a way that avoids any further intrusion in the XQuery language and semantics. We also discussed the StreetTiVo application were mxq is being used for data management in a large P2P environment. new construct called XRPC

Dagstuhl Research Online Publication Server

06472 Abstracts Collection - XQuery Implementation Paradigms

Author: Boncz Peter A.
Grust Torsten
Keulen Maurice van
Siméon Jerome
Publication venue: Internationales Begegnungs- und Forschungszentrum fuer Informatik (IBFI)
Publication date: 01/01/2007
Field of study

From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

CWI's Institutional Repository

Dagstuhl Research Online Publication Server

University of Twente Research Information

06472 Executive Summary - XQuery Implementation Paradigms

Author: Boncz Peter A.
Grust Torsten
Siméon Jerome
Publication venue: Internationales Begegnungs- und Forschungszentrum für Informatik
Publication date: 01/01/2007
Field of study

University of Twente Research Information

06472 Abstracts Collection - XQuery Implementation Paradigms

Author: Boncz Peter A.
Grust Torsten
Siméon Jerome
Publication venue: Internationales Begegnungs- und Forschungszentrum für Informatik
Publication date: 01/01/2007
Field of study

University of Twente Research Information

Welcome to Sigmod 2019 - The 2019 ACM SIGMOD International Conference on the Management of Data!

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue
Publication date: 30/06/2019
Field of study

CWI's Institutional Repository

Report on the Second International Workshop on Data Management on Modern Hardware (DaMoN'06)

Author: Ailamaki A. (Anastasia)
Boncz P.A. (Peter)
Manegold S. (Stefan)
Publication venue: A.C.M.
Publication date: 01/12/2006
Field of study

This report summarizes the presentations and discussions that occurred during the Second International Workshop on Data Management on Modern Hardware (DaMoN). DaMoN was held in Chicago on June 25th, 2006, and was collocated with ACM SIGMOD 2006. The aim of this one-day workshop is to bring together researchers interested in optimizing database performance on modern computing infrastructure by designing new data management techniques and tools

CWI's Institutional Repository

[Demo] Low-latency spark queries on updatable data

Author: Boncz P.A. (Peter)
Dave A. (Ankur)
Ghit B. (Bogdan)
Uta A. (Alexandru)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2019
Field of study

As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing

VU Research Portal

Crossref

CWI's Institutional Repository

The FastLanes Compression Layout: Decoding >100 billion integers per second with scalar code

Author: Afroozeh A. (Azim)
Boncz P.A. (Peter)
Publication venue
Publication date: 12/04/2023
Field of study

The open-source FastLanes project aims to improve big data formats, such as Parquet, ORC and columnar database formats, in multiple ways. In this paper, we significantly accelerate decoding of all common Light-Weight Compression (LWC) schemes: DICT, FOR, DELTA and RLE through better data-parallelism. We do so by re-designing the compression layout using two main ideas: (i) generalizing the value interleaving technique in the basic operation of bit-(un)packing by targeting a virtual 1024-bits SIMD register, (ii) reordering the tuples in all columns of a table in the same Unified Transposed Layout that puts tuple chunks in a common “04261537” order (explained in the paper); allowing for maximum independent work for all possible basic SIMD lane widths: 8, 16, 32, and 64 bits. We address the software development, maintenance and futureproofness challenges of increasing hardware diversity, by defining a virtual 1024-bits instruction set that consists of simple operators supported by all SIMD dialects; and also, importantly, by scalar code. The interleaved and tuple-reordered layout actually makes scalar decoding faster, extracting more data-parallelism from today’s wide-issue CPUs. Importantly, the scalar version can be fully auto-vectorized by modern compilers, eliminating technical debt in software caused by platform-specific SIMD intrinsics. Micro-benchmarks on Intel, AMD, Apple and AWS CPUs show that FastLanes accelerates decoding by factors (decoding >40 values per CPU cycle). FastLanes can make queries faster, as compressing the data reduces bandwidth needs, while decoding is almost free

CWI's Institutional Repository

JCC-H: Adding Join Crossing Correlations with skew to TPC-H

Author: Anatiotis A.-C. (Angelos-Christos)
Boncz P.A. (Peter)
Kläbe S. (Steffen)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/08/2017
Field of study

We introduce JCC-H, a drop-in replacement for the data and query generator of TPC-H, that introduces Join-Crossing-Correlations (JCC) and skew into its dataset and query workload. These correlations are carefully designed such that the filter predicates on table columns in the existing TPC-H queries now suddenly can have effects on the value-, frequency- and join-fan-out-distributions, experienced by operators in the query plan. The query generator of JCC-H is able to generate parameter bindings for the 22 query templates in two different equivalence classes: query templates that receive “normal” parameters do not experience skew and behave very similar to default TPC-H queries. Query templates expanded with the “skewed” parameters, though, experience strong join-crossing-correlations and skew in filter, aggregation and join operations. In this paper we discuss the goals of JCC-H, its detailed design, as well as show initial experiments on both a single-server and MPP database system, that confirm that our design goals were largely met. In all, JCC-H provides a convenient way for any system that is already testing with TPC-H to examine how the system can handle skew and correlations, so we hope the community can use it to make progress on issues like skew mitigation and detection and exploitation of join-crossing-correlations in query optimizers and data storage

CWI's Institutional Repository

The FastLanes compression layout: Decoding >100 billion integers per second with scalar code

Author: Afroozeh A. (Azim)
Boncz P.A. (Peter)
Publication venue
Publication date: 10/07/2023
Field of study

The open-source Fast Lanes project aims to improve big data formats, such as Parquet, ORC and columnar database formats, in multiple ways. In this paper, we significantly accelerate decoding of all common Light-Weight Compression (LWC) schemes: DICT, FOR, DELTA and RLE through better data-parallelism. We do so by re-designing the compression layout using two main ideas: (i) generalizing the value interleaving technique in the basic operation of bit-(un)packing by targeting a virtual 1024-bits SIMD register, (ii) reordering the tuples in all columns of a table in the same Unified Transposed Layout that puts tuple chunks in a common "104261537" order (explained in the paper); allowing for maximum independent work for all possible basic SIMD lane widths: 8, 16, 32, and 64 bits. We address the software development, maintenance and future proofness challenges of increasing hardware diversity, by defining a virtual 1024-bits instruction set that consists of simple operators supported by all SIMD dialects; and also, importantly, by scalar code. The interleaved and tuple-reordered layout actually makes scalar decoding faster, extracting more data-parallelism from today’s wide-issue CPUs. Importantly, the scalar version can be fully auto-vectorized by modern compilers, eliminating technical debt in software caused by platform-specific SIMD intrinsics. Micro-benchmarks on Intel, AMD, Apple and AWS CPUs show that Fast Lanes accelerates decoding by factors (decoding > 40 values per CPU cycle). Fast Lanes can make queries faster, as compressing the data reduces bandwidth needs, while decoding is almost free

CWI's Institutional Repository